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Abstract: Inferring decreased or increased metabolic functions from transcript profiles is 
at first sight a bold and speculative attempt because of the functional layers in between: 
proteins, enzymatic activities, and reaction fluxes. However, the growing interest in this 
field can easily be explained by two facts: the high quality of genome-scale metabolic 
network reconstructions and the highly developed technology to obtain genome-covering 
RNA profiles. Here, an overview of important algorithmic approaches is given by means of 
criteria by which published procedures can be classified. The frontiers of the methods are 
sketched and critical voices are being heard. Finally, an outlook for the prospects of the field 
is given. 
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1. Introduction 

Genetic regulation is a major control mechanism of the activity of the cell's metabolic functions, 
especially in the frame of longer times where the metabolic function is described as a specific metabolic 
input/output behavior of the cell. Its activity is defined as the metabolic flux, i.e., the consumption 
and production rate of specific metabolites, related to this function. A general description of the flow of 
information from the genome to metabolism is as follows: the process is initiated by transcription factors, 
RNA polymerase transcribes genes into RNA, RNA is transported to the ribosome and translated into 
proteins, and after folding, post-transcriptional modification, and transport to the site of action, proteins 
act as enzymes and transporters catalyzing biochemical reactions fluxes of the molecules in the cell 
(quantified by the reaction flux, the net number of converted molecules by time by cell volume). The 
mechanisms of this control have already been recognized as being complex and multi-level — setting 
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up predictive quantitative models is difficult [1]. Transcription factors are controlled by mechanisms 
on different layers of the cellular system: their own transcription, translation, and post-translational 
modification, their localization, their activation by external signaling substances or the concentration of 
internal metabolites, and their combination with other transcription factors. Some mRNA species [2] 
and transcription factors [3] are directionally transported by microtubuli in a controlled manner while 
others rely on diffusion. The efficiency of the translation may dramatically differ between different 
genes [4]. Finally, the catalytic efficiency of different enzymes varies along six orders of magnitude [5]. 
The metabolic flux rate is not only determined by the enzymes' concentrations but by a multitude of 
regulators, some of which change the reaction rate by several orders of magnitude [5]. For many 
processes the modifying factors have been discovered but on the scale of the whole genome, most of 
them are unknown. 

It might seem presumptuous to propose that transcript data can be used to predict metabolic functions. 
However, these predictions have received much interest because: 

• the layer of RNA transcripts (as opposed to the layer of proteins, the layer of reaction fluxes, 
and the layer of metabolites) is the only layer where a complete quantitative snapshot of all 
molecular species is currently feasible. Reaction flux estimations currently cover only a tiny share 
of all reactions. Metabolite concentrations are currently measured for some 100 s metabolites 
but specific classes of metabolites such as lipids still present large challenges. Protein amount 
estimations at the genome-scale are now being done but the effort necessary is huge. The layer of 
DNA, whose information is a precondition for any transcription-based analysis, is not mentioned 
as qualitative information. 

• transcript arrays are moderately priced in relation to the amount of data gathered, 

• the experimental effort for the researcher is moderate due to an highly automatized process, 

• the technology provides a low ambiguity and accurate estimates of the RNA amount changes [6]. 
The high number of probes allows to distinguish between the RNA of separate genes, with only 
few exceptions. Ambiguity of the peaks is the main problem of the estimation of metabolite 
concentrations by mass spectrometry. Ambiguity is also the largest challenge in flux estimations 
based on 13 C marked substrates, and 

• well-curated genome- scale reconstructions of the metabolic networks are available [7-10]. 

The measurement of reaction fluxes, metabolite concentrations, enzyme activities, and protein 
amounts are currently undertaken for a subset of all molecular species. The measurement of protein 
amounts is just becoming feasible with the advent of techniques such as single-shot ultra HPLC [11]. 
If all enzyme activities and metabolite concentrations were available, a much more accurate prediction 
would be possible, but that is not the case on the large scale. Thus, to judge the results of the reviewed 
studies squarely it must be stressed that the expectations must be lowered accordingly. The systems 
biologist faces the trade-off between coverage versus accuracy versus the data being closer to the 
enzymatic activity, i.e., that quantitative proteomics data would provide a better indication of enzymatic 
activity, but the technique does not have the coverage provided by transcriptome data. 

Here, studies with the primary focus on metabolism are reviewed. Other major areas of application of 
transcript data [12] are not covered, such as (i) detection of transcriptional co-regulation leading to (ii) 
detection of transcription factor binding sites [13] and (iii) transcriptional biomarkers [14]. 
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2. Fundamental Studies 

To demonstrate the difficulty of the task bridging several layers of cellular interaction, selected studies 
of the relation from one layer to the next will be sketched. 

2.1. Gene Chip Intensities-^-mRNA 

DNA microarray read-outs depend on the RNA concentrations but also on the varying affinity of the 
RNA to the probes, which is unknown on the large scale, thus special care is needed when analyzing the 
data [15]. Nevertheless, it is a very dependable technique [6]. For a comparison of different gene chip 
techniques, see Baldwin et al. [16]. Often, a genome-scale gene chip analysis is coupled with a more 
accurate qPCR for selected genes as a means of validation [17]. Advanced experimental techniques such 
as RNA-seq [18] and SAGE [19] allow a more accurate genome-scale quantification of RNA than gene 
array readouts and will eventually replace them [20], but effort and price currently restrict its widespread 
use [21]. 

2.2. mRNA^Protein 

In a pioneering study by Gygi et al. [22], the correlation between 106 studied mRNA levels to their 
coded protein levels showed a high value of 0.935. Gygi noted that the number is far lower if the 
extremely highly abundant proteins are disregarded; then it can be as low as 0.1. In a further note, 
the relation of protein levels below the detection limit to their respective mRNA levels is obviously 
unknown, thus, for the numerous proteins that only occur in very small quantities, the relation to RNA 
levels is unknown. In a subsequent study Griffin et al, advocates the combined consideration of both 
mRNA and protein levels to understand the regulation of central metabolic functions in yeast [23,24]. 
Tuller et al. predicted protein abundances from mRNA expression levels by taking into account 
additional information on the genes [25]. The results on the test set showed a good correlation of 
0.76. Further studies on the relation of RNA levels and protein abundances have been reviewed by 
Meier et al. [26]. In particular the study in human cell lines [1] should be mentioned. In an experimental 
analysis of Arabidopsis, among 319 protein/transcript pairs, 56% showed concurrence between transcript 
and protein, and it was suggested that for the others post-transcriptional modification takes place [27]. 

Mechanistically, the relation between RNA and protein concentrations can be seen as the interplay 
of three aspects: (i) the life span of RNA and (ii) proteins as well as (iii) the translation efficiency at 
the ribosome [4]. In a groundbreaking work, Schwanhausser studied the life cycles of RNA and protein 
translation in mammalian fibroblasts and found "that the cellular abundance of proteins is predominantly 
controlled at the level of translation" [28]. The rates of mRNA synthesis and decay in yeast in response 
to stress have been measured [29] The life span of proteins in vivo has been assessed on a large scale in 
yeast [30] and for selected glycolytic enzymes in mammalian cells [31]. 
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2.3. Enzyme Concentration — > Enzyme Activity 

The enzyme activity (the maximal catalytic rate v max for a given cell volume) depends on the enzyme 
concentration. Mostly, the relation is approximately linear in a predefined environment — the ratio is 
called turnover number. The turnover numbers of enzymes (together with other kinetic parameters) 
have been estimated for many enzymes, comprehensively reviewed and made available in public 
databases [5,32]. With respect to the set of all enzymes, this information is far from complete. Turnover 
numbers have been measured for different conditions (pH, temperature, and the concentrations of 
activators and inhibitors) and the resulting values vary considerably for one enzyme. Some enzymes 
are nine orders of magnitude more efficient than others (minimal vs. maximal turnover numbers in [5]). 
Considering this data, the variability of this step in the chain from RNA to metabolic flux is greater than 
of any of the other steps. 

2.4. Enzyme Activity — > Metabolic Flux 

The prediction of metabolic fluxes from enzyme activity information (and concentration of reactants, 
products, and other metabolic species) has been extensively studied in the field of kinetic modeling 
and its results are available in public databases [33,34]. A main challenge in the understanding is the 
interplay of metabolite concentration, enzyme levels, and reaction fluxes in a highly connected network. 
The network effect, defined as the difference of the simultaneous flow of chemical reactions compared 
with the isolated flow of reactions, modifies the activity-flux relation. It is studied in metabolic control 
analysis [35-37]. In extreme cases, it can lead to paradoxical situations where an increased enzyme 
amount leads to a lower flux in the same metabolic reaction. 

2.5. Crossing Several Layers 

Hancock et al. analyzed the relation of RNA abundance to metabolite concentrations in combination 
with the topological structure of the network. Based on clustering of correlated genes, their 
approach allows the identification of hub reactions depending on a specific change of condition, which 
subsequently leads to a minimal set of commonly controlled metabolites. Their results support the 
hypothesis that the gene expression response (on different forms of stress on E. coli in this case) targets 
a small number of metabolites which consequently entails a large-scale change in the metabolism [38]. 

Kharchenko et al. found that the highest co-expression of metabolic genes is arranged in simple motifs 
in the metabolic network, in other words, "regulation of metabolic genes is local" [39]. Cakire^a/. 
studied the transcriptional adaption of yeast on growth media. They calculate optimal transcript ratios 
on the basis of elementary flux modes [40] and the comparison to real transcript ratios showed a high 
agreement [41]. This result, in comparison with other studies showing less agreement, leads to the 
conclusion that the adaptation on cellular substrates is a distinguished case. 

Hajduch et al. compared the proteome of different oilseed to reveal differences in the intermediary 
metabolism, and their analysis showed a diverging use of malate as a precursor for lipids [42]. 
Saito et al. reviewed studies using transcript and metabolite co-occurrence for various applications in 
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plant biology [43]. Ishihama et al. performed a large-scale proteomic screening of E. coli and found 
that, among the enzymes, only proteins involved in energy metabolism are highly abundant [44]. 

Of particular interest are studies which measured RNA, protein, fluxes, and metabolite concentrations 
in parallel in the same experiment [45,46]. The common finding in these studies is that there is not a 
high overall correlation between the abundance of RNA and the coded protein, between the enzyme and 
the catalyzed flux, and between the metabolite concentrations and the level of enzymes that catalyze 
them. However, looking at the regulation of selected metabolic paths and functions, in almost all cases 
the pattern of abundance changes of RNA and protein is in accordance with the observed changes in 
reaction fluxes and metabolite concentrations. To sum it up, although there is little direct predictivity of 
RNA to the reaction fluxes, the transcriptional regulation of the metabolic function can still be observed 
in the RNA abundance data. 

2.6. mRNA — > Fluxes 

As a summary of an early attempt to relate transcript values to metabolic fluxes, ter Kuile 
expressed "strong doubts on whether transcriptome and proteome analysis suffices to assess biological 
function" [47]. The conclusion has been drawn by the authors of subsequent approaches that transcript 
profiles must be used in conjunction with other information to yield meaningful results. 

Moxley et al. [48] correlated the fluxes (estimated by tracer experiments) to the respective RNA 
levels and found a mere correlation of 0.07, which could be increased to 0.8 by the use of a 
network-based model from which a parameter called "metabolite interaction density" is calculated. This 
density is used as a modifier for the flux prediction from RNA levels. The conclusion of this study is 
that the consideration of the metabolic network is essential to draw a predictive relation from transcript 
abundances to fluxes. 

Yang et al. studied gene expression in Synechocystis in combination with 13 C isotope-based 
flux measurements and emphasizes the importance of integrating transcript and flux data for the 
understanding of regulatory mechanisms [49]. 

Daran-Lapujade et al. studied the role of "hierarchical" flux regulation (by changed enzyme activity, 
e.g., transcriptional regulation) versus metabolic regulation (change of flux due to changed metabolite 
concentrations) for glycolytic enzymes in yeast [50]. Factor analysis showed that transcriptional 
regulation was only responsible for 20%-50% of the observed flux changes. A similar analysis [51] 
led to the assignment of different roles to the regulated enzymes in glycolysis in yeast: regulation of 
some is predominately hierarchical, for others it is metabolically. For some, the regulation is cooperative 
between both, and for others it is antagonistic. In an earlier study they compared other central metabolism 
pathways and found strong qualitative correspondence between transcript and flux changes for the 
maltose metabolism, partial correspondence for triose-phosphate cycle and pentose-phosphate pathway, 
and little correspondence for glycolysis [52]. Their results put the prediction methods reviewed in the 
next chapter into perspective. However, glycolysis is a quite special pathway due to the large enzyme 
concentrations. Its fast response (for instance, to the sudden loss of membrane potential due to a rupture) 
is absolutely necessary as ATP depletion leads to rapid cell death. The transcriptional regulation is too 
slow for this life-saving response. Furthermore, the rapid growth of yeast on a glucose-rich media is an 



Metabolites 2012, 2 



619 



extreme condition rarely found in vivo, thus, it is likely that the structure of the metabolic system is not 
optimized to this situation. So their findings regarding glycolysis do not seem to be sufficient to discard 
the idea of observing metabolic changes from transcript data for the entirety of the metabolism. 

2. 7. Regulation of Metabolic Genes 

Which metabolic genes are regulated at all? Wessely et al. analyzed transcript profiles of E. coli 
and found that pathways {i.e., the set of biochemical reactions necessary to perform a specific metabolic 
conversion) associated with high protein cost are "controlled by fine-tuned transcriptional programs" and 
those with low protein cost are only regulated in key reactions [53]. 

And how are the genes (resp. transcription factors) controlled? In the transcription factor network 
of E. coli, a hierarchy of general and specific transcription factors has been found, and each metabolic 
function is controlled by a distinct combination of them. Enzymes catalyzing sequential reactions are 
co-regulated by the same transcription factors, while the regulation at junctions in the metabolic network 
is more complex [54]. An interesting fact has been found by Notebaart et al. which provides an argument 
to analyze a metabolic network with respect to metabolic functions and not the graph structure alone: 
"The co-regulation of metabolic genes is better explained by flux coupling than by network distance" in 
E. coli [55]. 

2.8. Genetic Interactions 

In the studies reviewed so far, the focus was the correlation between an individual RNA and the 
protein, flux, or concentrations. There is also another form of interactions called epistasis, which has 
also been modeled in the context of metabolic networks. An epistatic interaction occurs if the phenotypic 
impact of the knockout of one gene depends on the knockout of another gene [56]. Such an interaction 
might be caused by redundant reaction paths in the metabolic network in which case it can be predicted 
by network-based approaches [57-60]. One common finding is that most epistatic interactions are 
restricted to certain conditions [57,59]. Potentially, the verified set of epistatic interactions can be used 
for the more accurate interpretation of transcript profiles. Szappanos et al. studied genetic interactions 
for the metabolic genes using the flux-balance framework for yeast [61]. They found many "instances 
of genetic interactions ... not explained by the structure of the metabolic network", indicating that this 
is one more complicating factor that has to be taken into account for the mechanistic description of the 
transcriptional regulation of the metabolism. 

3. Systematic Comparison of Methods 

To systematically assess the multitude of studies relating RNA profiles to the metabolism, criteria will 
be given to distinguish how the profiles are used. 
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3. 1. Absolute/Relative/Coexpression 

Expression profiles can either be used in several ways, (i) Expression profiles can be directly used 
to assess a single state, which is called absolute, e.g., to decide whether a gene is active [62]. (ii) 
Differential expression profiles can be used to differentiate between states (changed conditions, time 
series) — normally logarithmic expression values are subtracted, which is called relative, e.g., to quantify 
changed metabolic activities [63]. (iii) A third alternative is to analyze the correlation of expression 
changes for each pair of genes, called co-expression, e.g., to assert which metabolic paths are controlled 
concertedly [64]. 

Absolute expression profiles are widely used to predict the active regions in metabolic 
networks [62,65-69]. Absolute expression profiles are also used for network reconstruction [7,8,70]: 
if a particular gene is expressed in at least one of a large number of expression profiles in a particular 
cell type, then the reaction catalyzed by or the transport process facilitated by its gene product can be 
considered as a part of the network [71]. 

Relative expression profiles are often simply analyzed by counting the number of up- or 
down-regulated genes using a threshold on the ratio (e.g., more than 2-fold change) or the significance 
level (e.g., using t-test) with respect to classifications such as gene ontology [72] or KEGG maps [73]. 
However, a quantitative prediction of the change of the metabolic mode of operation has also been 
demonstrated [63,74]. To cope with the non-linear relationship of transcript change and enzyme activity 
change, a ranking approach called Differential Rank Conservation (DIRAC) has been successfully 
applied [75]. 

Expression correlations are used to determine which genes are commonly regulated, for instance to 
predict transcription factors. Metabolic pathways with a high correlation of genes coding the necessary 
enzymes can be considered as a functional mode of operation in a particular cell type [64,76,77]. 
Ihmels et al. analyzed the co-expression of genes coding enzymes and found higher correlations along 
linear reaction paths between branch points and a hierarchical modularity of the regulation [78]. Loraine 
demonstrates the use of the gene clustering tool CressExpress for metabolic genes [79]. 

3.2. Thresholds 

The distinction between active and inactive genes is crucial for all methods using absolute 
expression profiles. 

Hebenstreit et al. gave clear evidence that in reality there is a clear distinction between genes which 
are expressed and those which are not expressed (in the sense that the gene product is present in sufficient 
abundance to take effect in the cell) [80]. The observable concentrations of RNA species is distributed 
in a bimodal distribution reflecting a normal distribution for both expressed and non-expressed genes. 
To decide whether a gene is considered active or not, a threshold is the method of choice. As there is 
an overlap of RNA abundance levels of inactive and active genes [80], methods applying the threshold 
must be robust enough to cope with a certain fraction of wrongly assigned activities. This robustness 
will also allow the use of transcript data which are not accurately representing RNA counts. Based on 
a comprehensive analysis of a gene chip in conjunction with proteomics data, an optimal threshold can 
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be calculated. However, mostly such experiments are considered too elaborate and the threshold is set 
heuristically. Instead, the approach is validated by the overall predictivity. 

The negative effect of uncertainty of the optimal threshold value is reduced by its "soft" application. 
For instance, in the GIMME algorithm [62] the threshold is applied in such a way that an expression 
below the threshold entails a gradual (linear) penalty for an activity of the assigned reaction. Thus, a 
reaction assigned to a gene expressed at a lower level than the threshold can still be considered active but 
the total amount of these errors is minimized. In the iMAT approach [67,68,81] the threshold application 
is softened by the introduction of two threshold values. The upper threshold separates the genes highly 
likely to be active while the lower threshold separates the genes highly likely to be inactive, leaving a 
range of expression values without a clear attribution. As there is still no guarantee to avoid incorrect 
gene assignment, an optimization is used where the clearly active genes receive a bonus, the clearly 
inactive genes a penalty. More sophisticated is the MADE approach [74] that avoids the arbitrariness of 
the heuristic threshold setting. For each gene, a single but flexible threshold is calculated from a set of 
expression profiles by identifying the largest gap of values. 

In other approaches, the setting of a threshold is completely circumvented and the expression values 
are used in a continuous way [69,82,83]. 

3.3. Representation of the Metabolic System 

The way the metabolic system is represented is another important aspect of the methods. Mostly, the 
system is represented by the metabolic network which consists of the metabolites and the biochemical 
reactions which convert the metabolites in fixed quantities, the stoichiometric factors — thus, it is called 
stoichiometric model. Often a stoichiometric model is used to compute flux distributions in the 
flux-balance framework [62,65,69,74]. A different approach is to use metabolic paths (small linear 
chains of reactions) which do not necessarily form a complete network [64,84]. An alternative way to 
represent the metabolic system is to compute the set of elementary flux modes first [40] and perform 
the analysis using these flux distributions [41]. Also the decomposition of the total flux as convex sum 
minimal flux modes [85] parameterized by gene expression has been proposed [63]. The flux balance 
framework is equivalent to a bipartite graph (Petri net [86]) but also simplified graphs have been used 
such as the adjacency graph [87]. 

In a simplification of the stoichiometric model, the stoichiometric factors are ignored [88]. 
Hancock et al. use such a graph representation where the nodes are the metabolites, and for every 
biochemical reaction an edge is drawn from each substrate to each product [38]. 

3.4. Type of Inference 

A flux distribution or an active subnetwork can be computed by penalizing fluxes belonging to inactive 
genes and/or bonusing nonzero fluxes belonging to active genes (in other words, the binary compliance 
to the expression profile) [62,67,81,89]. This approach has also been used as a secondary criterion in 
constrained flux-balance optimization [68]. Expression data has been used to define upper bounds on 
fluxes in a flux-balance computation [69]. Another possibility is to use the expression values to define 
target values for the fluxes and minimizing the quadratic deviation [82]. A similar method, based on 
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error minimization and developed for protein levels [90], can in principle also be applied to expression 
data. A multi-layer probabilistic framework, called PROM, mainly integrates a metabolic network with 
a transcriptional regulatory network but is also capable of using transcript data [83]. Its basic idea is to 
assign a probability value to gene states. Expression profiles have been used to rank reaction paths [64] 
or similarly "metabolic modules" [91]. Based on the textbook pathway definition (e.g., implemented in 
KEGG [73]), expression values have been used to score pathways in a framework called differential rank 
conservation [75,92,93]. The clustering of sets of genes, very common in the elucidation of transcription 
factors, has also been applied in conjunction with metabolic functions [87]. Also, graph theoretical 
inference has been used [94]. The topology of the metabolic network is the starting point to find the 
so-called regulatory signatures, patterns of gene changes indicating a diseased state (type 2 diabetes 
mellitus in this case) [95]. 

Gene set analysis [96] can be applied to metabolic pathways [97] as a distinct approach to use 
transcript correlation. A common technique to evaluate transcript profiles is to count up-/down-regulated 
genes (with a significance threshold); this can also be applied to KEGG pathway maps [73] or GO 
terms [72] to estimate the emphasis on certain functional characterizations [98]. 

3.5. Biological Focus of Studies 

Methods applying transcript data to the metabolism can have many different aims. As a distinguishing 
characteristic, some studies are directed to lay theoretical foundations, while others are directly targeted 
to answer specific biological question. 

For the first category, the reconstruction of a metabolic network for a specific cell with the help of 
transcript data can be mentioned [71]. Once a universal metabolic network is reconstructed (such as 
the universal human cell [7]), the subnetwork of reactions in specific cell type can be obtained with the 
same approach [8,67,81,89]. Similarly, transcript data is also used to estimate the set of active reactions 
in a particular state [68,69,82,99]. From the set of active reactions in a particular state, the essential 
information can be extracted in a further processing step, such as the so-called flux phenotypes [48] or, 
similarly, the metabolic state [90]. The detection of novel metabolic pathways [100,101] is an application 
in the area of fundamental biochemistry. 

There are a number of studies that try to understand the regulation patterns by analyzing the 
co-expression of metabolic genes in a large number of transcript profiles [38,39,54,55,78]. These 
regulation patterns can lead to the prediction of transcription factors of one or several genes. Reed 
and Pals son analyzed the connection between correlated genes and coupled reactions [102]. 

Some applications of transcript data are directly related to clinical questions such as the prediction 
of biomarkers [17,94], the prediction of drug targets [67,68,103], identification of reporter metabolites 
in type 2 diabetes [95], the study of the effects of a drug such as baicalein [104], and identifying diet 
effects [105]. The search for target metabolites of regulation (i.e., concerted regulation of genes to 
change the concentration of a certain metabolite) was the focus of another study [38]. If the underlying 
hypothesis of this study was true also for organisms other than E. coli, then this method would open a 
path to identify biomarkers in biotechnology and medicine. 
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Often, studies have an explicit biotechnological focus, for instance plant strain optimization [106], 
bacterial production rate optimization [107,108], optimization for algal growth [109,110], and 
understanding seed filling of oilseeds [42]. 

4. Available Software 

The threshold-based activity prediction GIMME [62] (and the closely related iMAT [81], see 
Section 3.2 for the difference) is widely used, as it requires only minimal preconditions: a functional 
stoichiometric model and a few transcript profiles suffice. Without any further requirement it can be 
applied to predict the exometabolic fluxes. As these fluxes are often known, they can be used to increase 
the reliability of the model, in a multi-step algorithm to ensure the concordance to the input/output 
fluxes [62,81] or directly in the flux-balance optimization [68]. 

These expression-based prediction methods have been implemented in the universal flux computation 
frameworks COBRA [1 1 1] and FASIMU [1 12]. For the iMAT method [81] a standalone implementation 
is available [113]. The software for quantitative application of transcript data for flux prediction by 
Lee et al. is also freely available [82]. The TIGER [114] toolbox can be recommended if transcriptional 
regulation should also be taken into account. If a large number of transcript profiles are available and 
transcriptional networks should also be modeled, the freely available probabilistic framework PROM 
[83] can be recommended. The threshold value can be adjusted if quite a number of transcript profiles 
are available. It can be calculated individually with an optimization using MADE, which is also freely 
available [74]. 

To analyze correlations of the expression of different genes from transcript profiles with respect to 
metabolic paths, the PathRanker method [64] offers a freely available implementation. It does not require 
a functional stoichiometric model but needs large profile sets to work reliably. 

5. Conclusions and Outlook 

Inferring metabolic activity changes from transcript profiles is justified in two ways: mechanistically 
and by the assumption of evolutionary optimality. The former is based on the fact that RNA is translated 
into proteins then working as enzymes or transporters, thus modifying the metabolic flux related to the 
function. The latter is based on the argument: if the cell undertakes the effort to increase the mRNA 
production rate, it does so only with a purpose (related to the philosophical concept of final cause). The 
most likely purpose is to enhance a function for which the coded protein is required. 

As the direct correlation of transcript profiles to metabolic reaction fluxes is not high, there is a wide 
range of applied methods having different strengths and weaknesses. The critical question is whether a 
particular method is suited to a particular application. 

There is a clearly recognizable trend that the applied methods are increasingly enriched with available 
knowledge as the only way to increase the predictivity. 

For the outlook of the field, it is foreseeable that large-scale metabolomics, proteomics, fluxomics, and 
enzyme characterization will become more manageable and affordable and the need to cover the wide 
distance from transcript to metabolism will vanish. The methods can then be improved with mechanistic 
descriptions of the underlying processes as soon as they are discovered. The methods crossing several 
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layers will have to include more components as it will be possible to parametrize them using experimental 
data. Genome- scale quantitative proteomics is on the brink of being widely available and feasible [11]. 
Quantitative metabolomics has reached the level of feasibility for hundreds of species. The developers 
of the reviewed methods and the users of their results will adopt this data when the coverage, cost, or 
accuracy makes it viable to do so. The application of mRNA data is, at the current time, just the most 
applicable means. 
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